Escape Excel: A tool for preventing gene symbol and accession conversion errors
نویسندگان
چکیده
BACKGROUND Microsoft Excel automatically converts certain gene symbols, database accessions, and other alphanumeric text into dates, scientific notation, and other numerical representations. These conversions lead to subsequent, irreversible, corruption of the imported text. A recent survey of popular genomic literature estimates that one-fifth of all papers with supplementary gene lists suffer from this issue. RESULTS Here, we present an open-source tool, Escape Excel, which prevents these erroneous conversions by generating an escaped text file that can be safely imported into Excel. Escape Excel is implemented in a variety of formats (http://www.github.com/pstew/escape_excel), including a command line based Perl script, a Windows-only Excel Add-In, an OS X drag-and-drop application, a simple web-server, and as a Galaxy web environment interface. Test server implementations are accessible as a Galaxy interface (http://apostl.moffitt.org) and simple non-Galaxy web server (http://apostl.moffitt.org:8000/). CONCLUSIONS Escape Excel detects and escapes a wide variety of problematic text strings so that they are not erroneously converted into other representations upon importation into Excel. Examples of problematic strings include date-like strings, time-like strings, leading zeroes in front of numbers, and long numeric and alphanumeric identifiers that should not be automatically converted into scientific notation. It is hoped that greater awareness of these potential data corruption issues, together with diligent escaping of text files prior to importation into Excel, will help to reduce the amount of Excel-corrupted data in scientific analyses and publications.
منابع مشابه
Correcting Inconsistencies and Errors in Bacterial Genome Metadata Using an Automated Curation Tool in Excel (AutoCurE)
Whole-genome data are invaluable for large-scale comparative genomic studies. Current sequencing technologies have made it feasible to sequence entire bacterial genomes with relative ease and time with a substantially reduced cost per nucleotide, hence cost per genome. More than 3,000 bacterial genomes have been sequenced and are available at the finished status. Publically available genomes ca...
متن کاملThe attitudes of nurses towards the occurrence and reporting of nursing errors in selected hospitals of Tehran University of Medical Sciences in 2019
Introduction: Nurses have an undeniable role in preventing nursing and medical errors, and evaluating their attitude towards error reporting, as a strategic indicator, can help nursing managers in preventing errors and improving patient safety and quality of nursing care. Therefore, this study was conducted with the aim of determining the attitude of nurses towards the occurrence and reporti...
متن کاملSequence-dependent gene conversion: can duplicated genes diverge fast enough to escape conversion?
Conversion between duplicated genes limits their independent evolution. Models in which conversion frequencies decrease as genes diverge are examined to determine conditions under which genes can "escape" further conversion and hence escape from a gene family. A review of results from various recombination systems suggests two classes of sequence-dependence models: (1) the "k-hit" model in whic...
متن کاملGenetic analysis of polyketide synthase and peptide synthase genes of cyanobacteria as a mining tool for new pharmaceutical compounds
Cyanobacteria are considered a promising source for new pharmaceutical lead compounds and a large number of chemically diverse and bioactive metabolites have been obtained from cyanobacteria. Despite of several worldwide studies on prevalence of NRPSs and PKSs among the cyanobacteria, none of them included Iranian cyanobacteria of Kermanshah province. Therefore, the aim of this study was t...
متن کاملERROR CONTROL TECHNIQUES USING BINARY SYMBOL BURST CODES ESD ACCESSION LIST AL 5823 " SEPTEMBER 1967 K . Brayer
Much has been written on the theoretical description of error correcting codes but, due to a lack of actual channel error patterns, little has been said of practical performance. In this paper the performance of three types of error control is evaluated for the case of independent random errors and for an actual channel exhibiting dense bursts. The selected codes are burst codes with high proba...
متن کامل